On DNA, RNA and Protein sequence molecules, sites can be defined attached to a specific stretch of sequence. Internally, sequence features have the type hierarchy.

  • Site -> SequenceFeature -> (PolynucleotideFeature,PolypeptideFeature)

To create a sequence feature, call the constructor and add it to a molecule as you would a regular site. Use set_location() to specify the location of the feature either using (start,end) kwargs or (start,length).


In [1]:
from wc_rules.bioseq import DNA, PolynucleotideFeature
inputstr = 'TTGTTATCGTTACCGGGAGTGAGGCGTCCGCGTCCCTTTCAGGTCAAGCGACTGAAAAACCTTGCAGTTGATTTTAAAGCGTATAGAAGACAATACAGA'

dna1 = DNA(ambiguous=False,id='dna1').set_sequence(inputstr)
feat1 = PolynucleotideFeature(id='feat1').set_molecule(dna1)
feat2 = PolynucleotideFeature(id='feat2').set_molecule(dna1)

feat1.set_location(start=90,end=99)
feat2.set_location(start=90,length=9)
print([x.get_id() for x in dna1.get_sites()])


['feat1', 'feat2']

To get the location of a sequence feature, use get_location(). The output is a dict with keys start,end and int values.


In [2]:
print(feat1.get_location())
print(feat2.get_location())


{'start': 90, 'end': 99}
{'start': 90, 'end': 99}

Alternatively, the start and end values can be accessed separately using their attribute names or using methods get_start() and get_end().


In [3]:
print([feat1.start,feat1.end])
print([feat1.get_start(),feat1.get_end()])


[90, 99]
[90, 99]

To get the length of a sequence feature, use get_length().


In [4]:
print(feat1.get_length())
print(feat2.get_length())


9
9

To read the sequence of a sequence feature, you have to access it using get_sequence() on the parent molecule.


In [5]:
print(dna1.get_sequence(feat1.start,feat1.end))
print(dna1.get_sequence(feat1.get_start(),feat1.get_end()))


CAATACAGA
CAATACAGA

Alternatively, the dict output of get_location() can be unpacked and passed to get_sequence().


In [6]:
print(dna1.get_sequence(**feat1.get_location()))


CAATACAGA